AITopics | function value gap

Collaborating Authors

function value gap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing SystemsJan-27-2025, 09:16:03 GMT

Originality: This paper provides a clear and deep analysis of a multi-stage accelerated SGD algorithm. The results show that the expected function value gap is bounded by an exponential decay term plus a sublinear decay term related to noise. They recover the deterministic case in the single stage and zero noise special case, while reaching the lower bound O(\sigma 2/n) in the noise term. The paper contains sufficient novel results and is competitive comparing with related work. In particular, the main results reveal how to choose the right time to switch from constant stepsize to decaying stepsize, a crucial choice for the overall performance of stochastic algorithms.

algorithm, function value gap, multistage accelerated stochastic gradient method, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)

Add feedback

On the Last-Iterate Convergence of Shuffling Gradient Methods

Liu, Zijian, Zhou, Zhengyuan

arXiv.org Machine LearningMar-12-2024

Shuffling gradient methods, which are also known as stochastic gradient descent (SGD) without replacement, are widely implemented in practice, particularly including three popular algorithms: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to the empirical success, the theoretical guarantee of shuffling gradient methods was not well-understanding for a long time. Until recently, the convergence rates had just been established for the average iterate for convex functions and the last iterate for strongly convex problems (using squared distance as the metric). However, when using the function value gap as the convergence criterion, existing theories cannot interpret the good performance of the last iterate in different settings (e.g., constrained optimization). To bridge this gap between practice and theory, we prove last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.

algorithm 1, assumption 3, theorem 4, (13 more...)

arXiv.org Machine Learning

2403.07723

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback